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Sampling for Rice* Yield in Bihar and Orissa. 


(Received fvr publication an 10th November 1926.) 

\ 

The need for reasonably accurate statistics of the human populat ion 
of a country has long been recognized throughout the civilized world. 
The idea that statistics of the production which supports that popula- 
tion are of well-nigh equal importance is rapidly gaining general 
recognition. It is, therefore, perhaps superfluous to argue in its favour, 
but it may be useful to indicate what objects a well-founded estimate of 
the yield of rice can fulfil. 

In the first place, since rice occupies a little more than half of the cult i 
vated area in the province of Bihar and Orissa, employs 4-5th of the total 
population for six months of the year and feeds the greater part of it 
throughout the year, its supreme importance in any survey of production 
can scarcely be denied. It probably provides at least one-half of the rural 
contribution to the “ provincial dividend'’, and that contribution is 
still vastly more important than that of the urban and industrial 
workers. Hence if a survey of production is desirable, good statistics 
of rice yield are essential. 

Apart from the uses to which those who direct or influence the policy 
of the State may put the statistics of production, there is no doubt that 
traders want information of crops as they come forward. Good informa- 
tion enables them to operate with more confidence and not only saves 
them from alternating between prosperity and financial disaster, but 
greatly benefits the public by stabilizing prices. For 1 his purpose it is not 
enough to know what is the production of the w hole province, but kno\v 
ledge of the conditions of the districts or even smaller units is very valu- 
able. Lastly, reasonably accurate knowledge of the real normal yields 
of important crops even for fairly large tracts would give guidanc e to 
courts who are required to estimate the value of produce for the decision 
of rent suits, for commuting produce rent or for settling fair cash rents. 


* I have used the word “ rice ” throughout as the ordinal y description ul the ciop. 
-is a matter of fact the samples taken are, strictly, 11 paddy i.e. unhusKcd rice. 
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Hitherto lack of such knowledge has frequently led to considerate 
injustice. 

The statistics of area under the rice crop are claimed to he fairly ac- 
curate. They are founded ultimately on the figures obtained from detail- 
ed measurement in a particular year on the ground. No doubt new land 
has been brought under rice in some places, the year of survey may have 
been a particularly favourable or unfavourable one, or to a limited extent 
other crops may have replaced rice. Allowance is made for these factors 
year by year, and perhaps there is no very large error. The question is 
examined further in the latter part of this bulletin. But in the figure for 
yield there is certainly room for bad mistakes. This has been fixed under 
the advice of the Director of Agriculture at a definite figure as normal, 
and is varied year by year by applying for each district a percentage, 
indicating how far the crop of the year falls below or rises above 
the normal. Now in the first place it is exceedingly probable 
that the normal rice yield differs materially between districts. Bur, 
apart from that, there is no machinery for fixing with any reasonable 
accuracy the percentage of the year. All that is done at present 
is for the local police officers to make a guess, at which in 
succession the Subdivisional Officer, the District Officer, and the 
Director of Agriculture guess again. When it is considered that the 
percentage depends ultimately on the effect of the weather on very 
various soils, cultivated with varying degrees of skill and enterprise, 
planted with different kinds of rice, protected by irrigation works of 
greater or less efficiency or completely unprotected, liable to or immune 
from crop pests, and finally harvested over a period of nearly three 
months, it becomes apparent that the guessing ability of the officers 
concerned has to be remarkable. Unfortunately, too, not one of 
them has the least chance of finding out whether his guess was fairly 
right or wildly wrong. Hence the existing statistics of rice pro- 
duction are, I believe, the result of applying to a fairly accurate 
figure of area, an arbitrary standard of normal yield, and a pure 
guess of the condition of the year. I am convinced that by an expendi- 
ture of a sum of money, that is not excessive in view of the import- 
ance of the matter, the last two and perhaps the first could be replaced 
by a scientific estimate, in which the possibilities of error can be gauged. 
Incidentally the scheme proposed, though it would not completely solve 
the problems of the courts to which reference has been made, would go 
a great way to do so. 

I return to the point of the normal yield. This is based on a number 
of crop-cutting experiments made at different times in different places, 
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and it is to the defects of these experiments, with the consequent uncer- 
tainty of the conclusions based on them, that I now wish to draw atten- 
tion. 

The commonest method of estimating yield, apart from pure guessin^ 
from the appearance of the field at harvest time, is that described in the 
standard account of the danabandi (appraisement) system of rent pay- 
ment. It consists of the tenant cutting a small area where the- crop is 
thinnest and the landlord where the crop is heaviest, mixing the two 
together and taking the result as a fair sample. To secure accurate 
results this presupposes that the true mean lies midway between the 
highest and lowest sample, a condition which is very rarely found in prac- 
tice. The results so obtained are almost always far too high, i.c , , unfair 
to the tenant. In an acre field of rice bearing a mean yield of 15 maimds 
it might well be possible to find an area of I -20th of an acre yielding 40 
maunds ail acre, but it would be obviously impossible to find another 
bearing minus 10 maunds an acre. The method is not far different from 
attempting to estimate the average income of a population by taking 
half the sum of the incomes of the poorest man and the richest man 
in it. 

A second method is that at present prescribed. An officer of some 
standing visits the tract, of whose yield he is re<j uired to form an esl iniat e. 
and selects a fairly large area, usually 1-lOth of an acre, as containing an 
average crop. This he cuts and carries and in due course threshes and 
weighs. If he has time he makes two or more experiments, but 
it is obvious that he cannot usually manage to make a large number 
of such experiments. There are two main objections to this method : 
the first is that it depends entirely for its accuracy on the ability 
of the officer to select. He has to make his selection from n 
large number of fields growing different varieties at different points 
of maturity. It is very easy to be misled as to yield by the strength 
of straw, heavy straw being by no means always correlated nith 
heavy yield of grain. Again it is difficult to give proper weighl 
to the fields which will give little or no yield at all. and even in a 
normal year such fields are by no means rare. These factors make ( u - 
curate selection very difficult. The second objection is even more im- 
portant, viz., that there is no possible way of estimating what is tin* 
probability that the result of such selections is within a given range 
from the true mean yield. The method is comparable to estimating 
the average income of the population of a town by watching t ic strict * 
for a few days and then picking out a man, who looked to Ik- in average 
circumstances and discovering wha t his income is. 
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The only way in which a satisfactory esti m ate can be formed is bv 
as close an approximation to random sampling as the circumstances 
permit, since that not only gets rid of the personal element of the experi- 
menter, but also makes it possible to say what is the probability that the 
result of a given number of samples will be within a given range from the 
true mean. To put this in definite language, it should be possible to find 
out how many samples will be required to secure that the odds uto at 
least 20 to 1 on the mean of the samples being within one maund of the 
true mean. 

This is the problem to be solved. Before describing the progress 
made towards a solution, it is desirable to explain, some terms used in 
modern mathematical statistics, on which the solution depends. These 
will, no doubt, be quite familiar to many of my readers, to whom I 
apologize for the digression. The first term is the arithmetical “ mean 
which is the real object of our search. It is, of course, the sum of all 
the individual results of the cuts or samples made divided by the 
number of cuts or samples. The true “ mean” is the figure which 
would be obtained if the number of samples were increased, so as to 
cover the whole subject matter of sampling exhaustively. The whole 
subject matter of sampling is known as the u population ”, viewed as 
made up of a very large number of samples with varying values. The 
manner in which the values of the samples are numerically arranged 
is called the (i distribution ”. Clearly the distribution may in some 
populations be very much wideT than in others. The number of samples 
hiving a given value, or having a value lying within given limits, is 
eilled the “ frequency ” for that value or for the range of those limits. 
The range of limits is known as the “ class interval ”, and, for convenience 
of calculation, values of samples, when these vary continuously, are 
lumped together in equal class intervals and treated as if they all fell 
at the middle point of their class interval. A most important term 
is the “ standard deviation ” of the distribution. This is a measure 
of the amount of dispersiou or “ scatter ” about a particular value, 
usually about the mean. It is a perfectly definite value expressed in 
the same unit as that in which the value of each sample is expressed. 
It is obtained as follows. The value of each sample differs by a definite 
figure from the particular value, about which the dispersion is sought, 
e.g. from the mean. This difference is squared and (where the 
samples have been lumped together in class intervals) multiplied 
by the frequency of the class interval. The results are added 
together and divided by the sum of all the frequencies, and the square 
pot of that result is the standard deviation. A simple illustration 
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will explain the process. Suppose that 8 samples give the following 

results. 
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to divide up the map into equal squares, in order to pick out the villages 
in which samples were to be taken, and then to Bample the fields bear' 
ing the survey number 50, or the nearest fields planted with rice. For 
this method it was necessary to find out the date on which the crop 
on each field would be ready for harvest and to arrange to visit the village 
on that date. The practical difficulties involved made me search for 
another method, more automatic in the distribution of sampling in 
time and space. It is unfortunately impossible to use the result of 
Mr. Dobbs’ experiment, because three samples were taken in each field 
and the results lumped together. This has disguised the frequency 
distribution so much as to make deductions extremely hazardous. 

In 1923 with the help of Mr. Davies, the Settlement Officer of the 
Santal Parganas district, 400 random samples were obtained in a part 
of Godda thana. The sampler spent from the 4th December to the 
22nd December travelling from village to village through the area com- 
prising about 100 square miles. Wherever lie found, on the day of his 
visit, the crop being actually harvested, he took a sample. In the 
following year the same method was applied under the supervision of 
the Settlement Officers, Messrs. Davies, Mansfield and Gokhale, in the 
Godda thana of Santal Parganas, the Jajpur thana of Cuttack, and 
the Purulia thana of Manbhuin. The only difference was that four 
samples instead of one were taken, where practicable, on each field in 
which the harvest was in progress. 

In 1925 I obtained the permission of the Bihar and Orissa Govern- 
ment to continue and expand the experiment in Santal Parganas and 
Orissa. An officer was employed in each of eight subdivisions, averag- 
ing nearly 1,000 square miles in extent, and in^each some 15 centres 
were taken, so as to cover as widely as possible the subdivisional area. 
At each of these the officer was to spend four days, moving out on the 
first day two miles north, turning to the right and returning from the 
north-east, and on the succeeding days traversing the other three quad- 
rants in the same manner. The aim was to distribute the sampling 
as regularly as possible both in time throughout the period of harvest 
and in space over the area sampled. The limitation in distance to be 
traversed secured that the number of samples obtained would be few 
in tracts where harvesting was sporadic and many in tracts where it 
was in full swing, thus correcting any tendency to over-emphasize the 
yields from relatively unimportant types. The instruction to take font 
samples from each field was withdrawn for reasons given below. For 
the supervision of this year’s sampling I am indebted to Messrs. Mansfield 
and HouTton. 
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In all three years the actual process oi taking each sample has been 
the same. The apparatus consists of a small solid block of worn! on 
which are fixed four pairs of guides defining two lines at an an<de ofV 
to one another. Along these lines through the guides are pushed two 

narrow wooden battens, which pass among the stalks of t he paddy. Each 

bears a stop, so that, when this rests against the first pah of guides 
the length from the point where the battens cross to the slots near the 
ends of the battens is 67| inches. From one of these slots a third batten 
is pushed through the stalks to die slot on the other batten, and, when 
this third batten is fixed in the slots, an equilateral triangle is formed 
comprising an area of 1/3,200 of an acre. The paddy stalks j ust out side 
the battens are bent back, and the crop inside cut and threshed out 
there and then. The grain is placed in a numbered bag and dried for 
three or four days and then weighed. The size of the triangle lias been 
chosen to give one maund an acre for every' standard tola of the sample, 
while the triangular shape is intended to avoid the error caused by either 
including or excluding a whole line of the planted paddy. The weight 
can be accurately determined to half a tola in ordinary country scales 
by double weighing. The sampler is instructed to go to the centre, 
of the ail (field boundary) roughly parallel to the line on which the harvest 
is proceeding and take three paces into the field before laying down 
the solid block of wood which guides the battens. He is warned against 
the mistake of putting the block down so that the triangle of battens 
encloses it instead of excluding it, a mistake to which beginners are 
liable. 

The results obtained in the last three years arc given in Tables 1 
and II. Besides tho$e, to which reference has already been made, there 
are a set of samples taken in 1925 on the Kanke Agricultural Farm in 
Ranchi district and a set taken in the Atri t liana of Gaya district in 
the same season. The system has thus been tried in the Orissa deltaic 
tract, on the Chota Nagpur plateau of which Ranchi forms a part, in 
the broken country of Santal Parganas and Manbhum, and in the 
.alluvial Gangetic plain of North Santal Parganas and Gaya, ft lias 
been put in practice in tracts, where the harvest has been plentiful and 
in others, where it has been scanty. In 1925 the sampling extended 
in seven tracts over the whole period of the rice harvest, excluding 
only the very early bhadoi and the summer rice, reaped in March, 
which are harvests of very little importance. Conclusions that can 
be drawn from the figures may, therefore, be applied with 
safety to the whole province except perhaps the North-Gangetic 
tract. 
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I have relegated to the Appendix the technical discussion of the 
validity of my conclusions, which will be appreciated only by those 
readers, who have made a somewhat extensive study of statistical 
theory. It is enough to say here that the mean yield on the harvested 
area can, I think, be determined for a tract of about 1,000 square miles 
with an accuracy such that it will be out by more than one maund per 
acre in only one case out of twenty. It will be very seldom out by more 
than a maund and a half. To secure such determination, it is necessary 
to fix 12 centres, spread as evenly as possible over the area, and to put 
down against each centre the day on which sampling is to be done. These 
days should be spread evenly over the period of harvest. The sampler 
should go out a fixed distance in one direction and, circling round, return 
from another direction, so that it is secured that he covers approxi- 
mately the same area on each day. He should cut one Bample from each 
field, where he finds harvesting in progress. He will be able ordinarily 
to get from 30 to 40 samples in a day when the harvest is in full swing 
and 10 to 20 when it is slack. It is certainly not impossible to supply 
samplers for this work, which is extremely simple once it is explained. 
The men employed hitherto have been young men fresh from college. 
Any gazetted officer and most other officers could do it quite well. In 
fact, I have found that peons attached to the sampling officers fully 
understand what has to be done after a week or so. It is not in the 
least essential that the same officer should carry out all the sampling 
of a tract. 

The method is applicable to small or large tracts. The same 
number of centres will probably give a slightly greater degree of accuracy 
for a smaller and a slightly less degree for a larger tract, because the 
standard deviation on the whole increases with the size of the tract, 
though not at all rapidly. Thus the standard deviation of the whole 
of the Santal Parganas is only 9'26 maunds, actually less than that of 
Deoghar, though greater than that of any other subdivision. To deter- 
mine the mean yield of a village or an agricultural farm it would probab- 
ly be advisable to make at least 10 cuts on each of 10 days. It would 
be better to make at least 20 cuts. (It may be noted that the standard 
deviation of Kanke in 1925 was nearly the highest figure found so far.) 

The method could also be applied to other crops, especially cereals. 
It would be more difficult to apply it to crops, for which the marketable 
produce is obtained by processes more elaborate than threshing, or 
is bulky and awkward to weigh, such as jute or sugarcane. It is also 
an advantage if the crop is common enough to enable a fair number 
of samples to be taken on one day. 
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It is no advantage to take a large number ol samples from places 
very close together, where the crops will naturally be very much tin- 
same on the same day. The degree of accuracy is not seriously improv od 
by such practice. This explains why there is no need to take large 
samples instead of the handy samples obtained by my method. A 
sample of one-tenth of an acre is merely 32U of my samples taken in 
juxtaposition. It simply gives a determination of the mean yield ol 
that particular field, which is not more effectively accurate than that, 
given by say four small samples. Even four samples instead of one 
are not worth while, because in the great majority of cases they do not 
differ among themselves enough to affect the mean or the standard 
deviation of the whole set d samples. This may be illustrated from 
the columns for “ all classes ” and “ 1st cutting S! for Suntal kurganas, 
Godda thana 1924. Technically speaking, (lute is very high correlation 
between the individuals of such groups of samples, which makes the 
ordinary rule, that the standard deviation of the mean is the standai 1 
deviation of the population divided by the square root of the nmnbci 

t,o have a fairly level crest, the frequency not varying k RJ > ‘ , 

Soar or five class intervals around the mean. This on phases Mu 
SJyof “selecting an average field”, t . 

Hxnerimentinff. There will be nearly as many places, ' ' , 
i maunds above or below the mean, as there will be w \m >•* ^ 1 

one cnaund of it. Thus from the very nature of * “• ‘ 

from the physical impossibility of surveying ^ • t tlu . 

aneoasly or nearly so, and thep^ml enor o th ea 
present method of estimating is likeb to y 

^Ifymethod gives a reasonably c:losc 

vicld of the harvested area. ComequenG * » evident lv 

necessary to determine what is that 

not the area planted or sown, since in 1 ■ • fi ', w il s („;{*), 

bear no crop worth cutting, ft also excludes *• WJ*. 
which are usually measured in with ti «’ a ,« ol 

Since the cadastral survey figure is the . sat intact oi y 

the year, this second point is of imp®' ' ’ * vd ^ but 

determination of the area occupied 1>) » > £ jn i,iily country to 

it would appear to vary between nearly 1 usually tak< n 

scarcely more than 2 per cent, in level country It > 

as 5 per cent. 
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It is much more difficult to say what is the range of variation between 
the planted (or sown) area and the harvested area. In a bad year, 
when the rains stop early in September, it may well be 50 per cent, 
all over a subdivision. The planted area will also vary from year lo 
year. At present we depend on estimates, which are originally those of 
the village chaukidar, though they arc much more capable of intelligent 
check by higher authorities than estimates of yield are. It is possible 
for instance, by driving through a tract in a motor car to get some idea 
of the extent to which the rice crop is going to fail altogether. It is 
usually believed that the estimate of cropped area is sufficiently 
accurate. It may be, but the belief is intuitive rather than rational. 

It would certainly be a great advantage if the harvested area could 
be obtained by some form of sampling. A method was tried in 1925 
in Santal Parganas. The results, if not very encouraging, are interesting 
The plan was to make the sampler march from centre to centre across 
country as near as he conveniently could in a straight line. After certain 
intervals of time lie had to count 100 paces and note how many of these 
ended on harvested rice land. (There is no serious difficulty in doing 
this in January and early February, since little rice land is reploughcd 
by then.) Eacli count of 100 would give a definite percentage, in many 
cases 0, in many nearly 100, the difference from the full 100 being due 
to the presence of field ridges. The mean of these percentages would 
give the percentage of harvested rice land in the total area of the trad 
sampled. Six men were employed for nearly a month each and between 
them made 1071 counts. From these a mean of 17*22 per cent, was 
obtained. The standard deviation was 37' 10. This means that it is 
about 21 to I that the triie percentage is between 49*4 and 45. The 
accuracy obtained is hardly sufficient to justify the employment of 
six men for a month, in each district. Further the work is strenuous 
ana tedious, and it is probable that it would be in practice shirked, and 
results fudged. The difficulty of excluding large patches of thick jungle 
or other inaccessible country both from the sampling and the sampled 
area has yet to be overcome. But it is still possible that some method 
on similar lines may prove practicable. 

There is just one other caution which I would address to persons 
who have to estimate gross yields. That is that yields of rice are ulti- 
mately based, however imperfectly at present, on yields of paddy. 4 
standard figure for the loss of weight by husking is taken, usually one- 
third. It is probable that this varies a good deal with the kind of paddy, 
and there is good reason to believe that in Orissa the loss is considerably 
more than one-third. It would be well if the Agricultural Department 
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investigated this point on their farms. Tf auv kin,! of mihlv 
husks out much below the usual, it is clearly not such a good economic 
proposition, as an excess of unhusked yield might surest. Po-iUv 
the plant breeders may be able to produce a paddy, which loses less 
than one-third in husking, and, if so, the benefit might well he as great 
as that obtained from increased yield. 

My general conclusions are that sampling by my method on 12 davs 
throughout the harvest from different centres would secure a figure 
for the mean yield of a harvested area of 1,000 square miles correct 
within one maund in about 95 per cent, of cases, and tliai with the exist- 
ing revenue and agricultural staff there would he no serious difficulty in 
substituting this method for the unscientific mol hod of sampling now- 
prescribed. 

To the officers I have mentioned, under whose direction the samples 
were taken, and to the staff employed by them on the work 1 offer my 
cordial thanls, as I do to Mr. Dobbs for his advice and encouragement. 
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APPENDIX. 

It is often assumed that if a number (n) of samples are taken and the 
standard deviation of the whole population (c 0 ) is known, then the 
standard deviation (<r m ) of the mean of n samples is given by o^=o^n. 
This however, presupposes (1) that the samples are taken by the 
process of " simple ” or 14 random ” sampling, i.e>, that every portiui 
of the population is equally likely to be sampled every time, and (2) that 
the distribution of the population is such that the distribution of the 
means of n samples closely approximates to the 44 normal ” curve of 
error. 

I have attempted to investigate both these points before recommend- 
ing the method already tried for general application. The application d 
44 simple ” sampling to the rice crop would involve picking out the places 
for sampling by some random method such as putting numbered cards 
representing all the rice plots of the area in a box and drawing out the 
required number of cards, and then arranging to visit the plots at the 
time when the crop is ripe. Such a method is, I consider, impracti- 
cable. My method is practicable but is not 44 simple ” sampling. There 
is some degree of selection. It has to be seen whether the adoption of it 
in place of 44 simple ” sampling increases or decreases the rate at which 
the standard deviation of the mean diminishes as the number of cuts 
rises. 

If an equal number of cuts were taken on each day, say 20, the process 
would be equivalent to ( 1 ) taking one cut from each separate tract to make 
up a sample, and then (2) taking the mean of 20 such .samples ; the “tract" 
being the country which the sampling officer can cover from a centre. 
Process (1) theoretically gives a set of samples with means having a 

g 2 

standard deviation given by = Z2 “ where % is again the 

i>i Hi 

standard deviation of the population, s m that of the mean values of 
the tracts and n the number of the tracts. Process (2) is simple 
sampling of the means of the samples obtained by process (1), and 

will give a mean with a standard deviation given by ^ eTP 

k is the number of samples grouped together (in this case 20). 

To test, the theory I have taken from the original record of Rajinabal 
1925 the first eight cuts of each day, writing the ?>2 cuts oi the first, four 
days in one row, the 32 cuts of the second four days in the next row, anil so 
on. (In a very few cases, where less than eight cuts were taken in a da} ■ 
I have borrowed from the last cuts of the previous day or the following 
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day.) I have thus obtained 32 columns with 1 1 figures in each correspond 
ing to the 1 1 centres, each surrounded by four quadrants. T have totalled 
the columns and taken the means. I have (lien taken the moans of suc- 
cessive pairs, fours, and eights of the means so derived. 1 have alstu d- 
culated by taking the means of the cuts made from eaoh cent re. 1 have 
done the same thing for Bhadrak 1925, which had 12 centres instead of 1 1 
The results are as follows (standard deviat ion in maunds). 

IT 2 s 2 •> <» o 

, 0 m Gj‘ o..r a..; 

Raj mahal theoretical . . 69-00 34-21 3 .J 7 .-«} .j" ( ‘ 

” observed .. 1 -f >2 ins -lit n 

Bhadrak theoretical . . . 33-18 9 40 l <)8 .<*) ll- 

” observed .. 1-10 .74 .V )t ,"jj 

(The theoretical value of < 7 ^ is %- s^ divided by 11 for lUj- 
mahal and 12 for Bhadrak. The theoretical values of a 2 . a and a 2 
are one half, one quarter and one-eight of cj .) 

The observed results agree sufficiently well with the theory, and it 
seems quite clear that thq standard deviation is diminished and not in- 
creased by taking samples on this plan instead of by simple sampling. 

But this plan would require an equal number of cuts on each daw and 
this would certainly overweight the slack periods of harvest.. My method 
avoids this, but by so doing it almost certainly increases the standard 
deviation of the mean as compared with that, obtained by the plan just 
described. I cannot find any way of determining mathematically to 
what extent it does so, but the following test shows reasonably well what, 
happens. 

I examine below the means obtained from the nils made to the north- 
east of each centre against those obtained from the cuts made tothe south- 
east, south-west and north-west. These are given below. (Some cuts 
used in the general table were neglected for this purpose, because only one 
quadrant was sampled.) 
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The “ failures ”, i.e., results failing more than one maund away from 
the mean of the four quadrants are underlined. The actual standard de- 
viation is that derived from the means of the four quadrants, Sheppard’s 
correction being applied. The theoretical standard deviation is the result 
of dividing the standard deviation of the whole series of cuts in the area bv 
the square root of the average number of cuts in the quadrants. The 
“ failures ” look high at first sight, but that is due to the fact that the 
number of cuts in a quadrant was often distinctly lower than the standard 
of 20 cuts at 12 centres, which would seem to be what is required to ensure 
a mean correct within one maund in 95 per cent, of eases. A comparison 
of the actual with the theoretical standard deviation shows that on the 
whole my method of sampling gives about as stable a mean as simple 
sampling. The variation in the number of cuts at the different centres 
about wipes out the greater stability gained by distributing the cuts 
to the different centres. 

It is impossible to demonstrate by mathematical processes precisely 
what the odds are, and, without a much wider series of tests than have yet 
been made, empirical deductions must be somewhat uncertain. But I 
think it is quite safe to say that if at least 10 and not more than 40 cuts 
were ritade from each of the 12 centres at even intervals over the harvest 
season, the mean yield for an area of about 1000 square miles could be de- 
termined within one maund peracTe nineteen times out of twenty. The 
“ failures ” would very seldom be more than \\ maunds out. This de- 
pends on the assumption that the frequency distributions actually found 
are such that the means of a large number of cuts will form a frequency 
distribution closely approximating to the “ normal curve of error v . 
Where the sampling is “ simple ”, this is certainly true, of populations 
foil owing -the frequency distributions obtained. If, for example, 400 
samples were taken by “ simple ” sampling from populations like the 
Deoghar 1 925 and the Godda 1925 harvests, the spread of the mean of the 
400 samples would compare with the spread of the corresponding “norma 1” 
curve (i.e., that having the same standard deviation) as follows. 

limits. Deoghar Normal 


± -425 maunds 
± -850 „ 

±1-275 M 
±1-700 „ 

±> -3428 „ 

± -5142 „ 

± -6856 „ 

± -8570 „ 


1 125 inside to 1000 outside 
585 „ to 100 

364 „ to 10 „ 

342 „ to 1 „ 

Godda 

2517 inside to 1000 outside 
822 „ to 100 „ 

SOU „ to 100 „ 

1357 „ t. 10 „ 


. 1 138 inside to 1000 outside. 

. 583 „ to 100 „ 

. 330 „ to 10 

. 272 „ to 1 „ 

Norma] 

. 2531 inside to 1000 outside. 
. 837 „ to 100 „ 

. 3050 „ to 100 „ 

. 1375 „ to 10 



SAMPLING FOR ittCE YIELD IN BIHAR AND ORISSA 


It cannot then be said with certainty, w 

iat- effect on 

till 

rapidity of 

the approach to normality would arise from 1 1 

ie sub 

*4 it lit ie 

n u 

my method 

of sampling for {< simple ” sampling . But 

is it 1 

as lilt I 

e. or 

no eth cl uf 

diminishing or increasing the standard deviation o 

the m 

■an 

it probably 

has the effect of bringing the higher eonstai 

is about as 

[uii 

klv towards 

their values for the normal curve. The nwu 

.■est a] 

tpiom 1 

to 

a discussion 

of this problem is to be found in a paper by ‘ 

Stud 

out ” i 

iVt 

•1. VI I p;mos 

210 to 214 Biometrika. But the problem before 

him v 

as not precisely 

mine, and he distinctly states that his prol 

tlem 

s essentially a ‘* small 

sample ” problem. It deals with means of 2 

5, not 

as mil 

o w 

tli means of 

250 or so. The higher constants for some i 

if the 

elicit rili 

ut ions are given 

below (Professor Pearson’s notation). 







P. 



Godda 1925 


•527 


4-116 

Deogbar 1925 ..... 


1-227 


4-066 

Pakaur 1925 


K548 


0-349 

Godda 1924 1st cuttings 


•Olid 


2>729 

Jajpur 1924 1st cuttings 


•694 


3 -ass 


Professor Bowley’s tables on pages *271 and 303 of his k ‘ Element' 
of Statistics ” enable one to show how the observed mean is likely to \ e 
about the true mean. Assuming 25C samples for each trait, 
trials would be distributed thus : — 

(x is Professor Bowley’s measure of skewness and is |jl 3 o' in Pro- 
fessor Pearson’s notation .) 


— 

a 

K 

Nut more than 

I matmd 
above mean. 

Nut ni<>rc than 

L luaimri 
below mean. 

.M nrc than I 
iitauml from 
mean. 

Godda 1925 . 

0-63 

•64 

4.88S 

1.956 

156 

Jamtara 1925 

7-70 

•78 

4,752 

1,844 

•104 

Pakaur 1925 

6-54 

•86 

1,900 

4,982 

1)8 

Rajmahal 1925 

8*31 

*26 

4.712 

4,752 

536 

j 

Deoghar 1925 

11-59 

Ml 

3.999 

4,225 

r \ ) 

Dumka 1925 

815 

•59 

4,706 

4,782 ] 

oi - 

Jajpur 1925 . 

9 09 

1-51 

4,536 

4.711 

ii t 

Bhadrak 1925 

5-76 

■69 

4,943 

5,003 



It may, of course, be argued that my method, even if lie nidh 01 
nearly the same mean were obtained in a large number of iepiUtions, 
may be defective owing to bias. That must remain a matter of opinion. 
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The* method does, however, secure that the cutting is well distributed 
in space over the area to be sampled and in time over the period of the 
harvest. By laying down that approximately the same area shall be 
under sample on each of the days of sampling, it weights the importance 
of the contribution made to the mean by the harvest of the day. A day 
in the height of the harvest will contribute 30 to 40 samples ; in the 
slacker time only 10 to 20. There is no other obvious form of bias, 
unless, of course, centres are selected with grave disregard to their 
suitability. 

If it be admitted that the distribution of the means of a fairly large 
number of cuttings will closely approximate to a “ normal curve of error 
and that the standard deviation of the means is approximately that of the 
p o p ulatio n divided by. the stj uare root of the numb er of cuttings, then the 
degree of accuracy can be calculated. Leaving aside the standard devia- 
tions of rice grown on particular classes of land, which are sometimes very 
low, it may be taken that the range of the standard deviation of rice 
in Bihar and Orissa is from 4 to 12 maunds. At the lower limit, 240 cut- 
tings would give means of which only 15 out of 100,000 would be one 
maund or more away from the true mean. At the upper limit about 19 
out of 100 would be so divergent, and about 5 out of 100 would be more 
than 14 maunds away. With a standard deviation of 8 maunds, 210 
cuttings would give a mean correct within one maund about 95 times out 
of 100. This last is the basis of my scheme of 12 centres, from which may 
be expected at least 240 cuttings, but probably as many as 300. Where 
experience shows that the standard deviation of a tract is much higher 
than 8 maunds, the number of centres should be increased, if it is desired 
to obtain the same degree of accuracy. 

It is of interest to consider what degree of accuracy would attach 
to a calculation of the gross yield of the whole province of Bihar and 
Orissa, based on the system I recommend, the harvested area being 
estimated as it is present. If it may be assumed that the standard 
deviation of the error in estimating the harvested area is about 1 ; 20 of 
that estimate, it is practically certain (1,000 to 3) that the calculation 
of the gross yield is not more than 2 per cent, out. 
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